NASA Technical Memorandum - ** 1 Q6352 


/P/'T? / 
p/o ... 


Neural Networks for Calibration Tomography 


Arthur Decker 
Lewis Research Center 
Cleveland, Ohio 


Prepared for the._ . 

1993 International Symposium on Optics^ Imaging, and Instrumentation 
sponsored by Society of Photo-Optical Instrumentation Engineering 
Conference 2005 Optical Diagnostics in Fluid and Thermal Flow .... _ 

San Diego, California, July 11—16, 1993 




NASA 


(NASA-TM- 106352 ) NEURAL NETWORKS 
FOR CALIBRATION TOMOGRAPHY (NASA) 

10 p 





N9 3-26906 
Unci as 




G3/35 0164791 



ERRATA 


NASA Technical Memorandum 106164 


Arthur Decker 

National Aeronautics and Space Administration 
Lewis Research Center 
Cleveland, Ohio 44135 


The report number for the aforesaid Technical Memorandum is corrected to NASA Technical Memoran- 
dum 106352. 




NEURAL NETWORKS FOR CALIBRATION TOMOGRAPHY 
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SUMMARY 


Artificial neural networks are suitable for performing pattem-to-pattem calibrations. These calibrations are potentially 
useful for facilities operations in aeronautics, the control of optical alignment, and the like. This paper compares computed 
tomography with neural net calibration tomography for estimating density from its x-ray transform. X-ray transforms are 
measured, for example, in diffuse-illumination, holographic interferometry of fluids. Computed tomography and neural net 
calibration tomography are shown to have comparable performance for a 10 degree viewing cone and 29 interferograms 
within that cone. The system of tomography discussed is proposed as a relevant test of neural networks and other parallel 
processors intended for using flow visualization data. 


1. INTRODUCTION 

Efficient processing and utilization of whole Field or flow visualization data is especially challenging. The outputs of 
th ou sa nd.; to millions of data channels must be translated into concrete measurements, control actions, or other decisions. 
Artificial neural networks and kindred approaches like fuzzy logic are ways to perform these mappings efficiently. Neural 
networks can be trained or calibrated to convert flow visualization data into specific information. There are strong incentives 
to use neural networks for such parallel processing. They include the growing commercial availability of supporting software 
and hardware and the chance to provide test-facility or operator customized utilization of optical sensor data. 

Neural networks are not necessarily easy to use for calibration. An operator must design and provide representative sets 
of training and test examples. The design, training, and test phases require answering questions about the formatting of input 
and output vectors, noise immunity, the extrapolation and interpolation capability of the network and the like. The need to 
answer these questions weighs heavily in favor of performing neural net calibrations in test facilities with the direct 
participation of an expert user. The author is using that approach at Lewis Research Center. But he also feels constrained 
to ask the following question. Are there benchmark techniques which are so clearly defined and so representative of the 
general difficulty of using whole field data in aeronautics so as to provide convincing, generic testing of the effectiveness of 
neural net calibration or any other approach? One benchmaik test has been proposed and demonstrated that requires pattern 
based alignment of spatial filters. 2,3 This paper discusses a test that uses flow visualization data. 

The benchmark technique discussed herein is tomography as represented by a system of computed tomography developed 
by S. H. Izen. 4 ^ This system was developed originally to measure the information recovery potential of tomography 
performed within a limited angular range of viewing directions. The system will in general invert the x-ray transform (line 
projections) of a density distribution, where the density is nonzero within a sphere of unit radius. Interferograms were 
intended to supply the x-ray transform experimentally. A single diffuse illumination hologram will provide multiple 
interferograms within a cone angle of about 10 degrees. Computing density from these interferograms is extremely ill posed. 
Computational or calibration procedures that perform robustly for such a problem should have a high degree of credibility. 
The tomography system is contained conveniently in a software package that accepts measured data or computes projections 
for phantom data. 7 The package handles arbitrary viewing cone angles; it can be used to add noise to computed data to test 
noise immunity. 

This paper discusses the training and performance of artificial neural networks which were trained with phantoms created 
by the tomography package. The phantoms consist of functions which are orthogonal on the sphere of unit radius and linear 
combinations of these functions. The effectiveness of the training or calibration procedure, which maps interferograms onto 
density distributions, is tested by comparing the results with those of computed tomography. Tests are performed with clean 
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and noisy interferograms. The system of computed tomography itself is discussed in the next section. 

2. COMPUTED TOMOGRAPHY 

The software package was used in black box fashion to generate the training sets for the neural netwoiks as well as to 
perform computed tomography for comparison. The following discussion is limited to the specific forms used for this work, 
the general theory, approach, and terminology of tomography are discussed in the references. 

The density distribution is assumed to be nonzero on a sphere of unit radius. A particular representation of the density 
f in spherical coordinates is given by the infinite series 
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where P.** is a Jacobi polynomial (t = 0 herein) and P n m is an associated Legendre function. In the second summation, 
the index mo is incremented in units of 2 beginning at 0 for s even and at 1 for s odd 

The objective of tomography is to estimate the coefficients a> J from the x-ray transform of the density field. The x-ray 
transform in interferometry is the interference phase measured on planes perpendicular to all possible directions cm a unit half 
sphere. The projection slice theorem relates the two-dimensional Fourier transforms (or their inverses) of the interference 
phase measurements from an interferogram to the three-dimensional Fourier transform (or its inverse) of density f . 
Specifically, each inverse two-dimensional Fourier transform is proportional to the corresponding inverse three dimensional 
Fourier transform on a plane parallel to the interferogram and through the center of the transform space. The proportionality 
factor is 2rt . 

The inverse Fourier transform of (1) is given in spherical coordinates by a very similar expression. Simply replace the 
spherical coordinates in R 3 by spherical coordinates in inverse Fourier R 3 , and effect the replacement 
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where q is the radial coordinate in inverse Fourier R 3 , and J v is a Bessel function of the first kind 

The projection slice theorem then allows a separate equation in the coefficients to be generated for each transformed point 
in an interferogram. From now on, interfero grams, interference phase patterns, x-ray transforms, and Fourier transforms of 
these quantities will not be distinguished explicitly. Keep in mind that; interference phase is measured from interferograms; 
the interference phase, plus or minus a reference, is proportional to the x-ray transform; and the two-dimensional inverse 
Fourier transforms of the interferometric data appear in the solutions for the coefficients. 

The practical implementation of this tomographic procedure requires that; the series represented by (1) and (2) be 
truncated; the continua required by the theorems be replaced by discrete samples; and that the consequent simultaneous 
equations be solved Truncating the series at s = S yields equations in N coefficients where N is given by 
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The choices of number of coefficients, number of interferograms, and number of measurements per interferograms depend 
on the computing resources conveniently available. This comparison of computed tomography with neural net calibration 
tomography was sized for a SGI 4D/25 workstation with 24 megabytes of memory. The choice of number of coefficients 
was: S = 8 for N = 165 coefficients. This particular study was confined to a viewing cone angle of 10 degrees, where 
the cone angle is measured from the axis of the cone (the full range of viewing directions is - 20 degrees). This choice is 
realistic for much of the propulsion testing done in aeronautics. There were 29 interferograms, where each interferogram 
was sampled for phase on a 16 X 16 grid. There are then 28 interferograms clustered about a central interferogram. There 
are consequently 7424 equations to solve for 165 coefficients in tomography and input vectors containing 7424 elements 
to be used to train the neural networks. 

The final comments of this section pertain to solving for the coefficients. The problem is extremely ill-posed and the 
matrices extremely ill conditioned. Singular value decomposition (SVD) is the correct procedure for handling that kind of 
problem. 8,9 Computed tomography with SVD minimizes |Aa-b| where a is the vector of coefficients, b is the vector 
of Fourier transformed data, and A is a matrix that depends only on the geometry of the system. SVD yields column 
orthogonal matrices U and V and a diagonal matrix W of the singular values so that a is estimated from VW*V b, 
where W' 1 is a diagonal matrix containing the reciprocals of the singular values. SVD itself depends only on the geometry 
defining matrix A ; it allows all b in the range of A to be reached and produces a best estimate of a for b that cannot 
be reached. SVD is done exactly once per geometry, where geometry again is determined by the viewing cone angle, the 
l o cati o ns of the interferograms, and the locations of the data points. Measurement errors or machine precision are considered 
after the fact in deciding which singular values to retain. A particular density distribution is then estimated by substituting 
the calculated a in the truncated version of (1). 

3. NEURAL NET CALIBRATION TOMOGRAPHY 

Training a neural net, by contrast, is strictly a calibration procedure. An operator provides a set of training examples. Each 
training example associates a discretely sampled density output [f(r jf <^, 0^] with an input ft] consisting of sampled 
interferograms. The interferogram data might consist of interferogram intensity measurements, interference phase 
measurements, Fourier transformed measurements such as b in the previous section, or measurements transformed in any 
other fashion. The training set then consists of a set of training records ft, fj) where I, and f, are the input and output 
vectors for training record 1 . Training the particular neural net architectures used for this study requires minimizing E 
given by 
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where the O, are the actual outputs generated by the network in response to the training input vectors I, . Artificial neural 
networks are discussed extensively in the literature. An overview is contained, for example, in the references listed here. 10 ’ 12 
The networks used for this study consisted of interconnected nonlinear processing elements with weighted input connections. 
Various algorithms can be used to adjust the weights in order to minimize E in (4). The signature algorithm is 
backpropagation; it approximates a steepest descent minimization of E as expressed in weight space. There are many 
versions and modifications of this algorithm as well as alternative algorithms. 

The performance of a trained neural network is determined by testing; since there really is no way to know whether E 
has reached its absolute minimum, no guarantee that a network will interpolate or extrapolate adequately, and no indication 
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of noise immunity. An operator must judge that a neural network is performing adequately and uses test records, different 
from the training records, to build confidence in the trained neural network. 

A commercial software package was used to create, train, and test the neural networks used for this study. 13 These 
activities were executed in black box fashion as with computed tomography. The actual comparison of computed and neural 
net calibration tomography is discussed next. 

4. COMPARISON 


Computed tomography and calibration tomography were, as stated, compared through degree 8 and with 16 X 16 
interferograms . The tomography study in the references 6 was performed through degree 12 and with 32 X 32 
interferograms. That study required a mainframe computer. This paper is intended, not as a discussion of tomography, but 
rather as a discussion of testing artificial neural networks in a manner useful to the aeronautics field. 

There are 165 orthonormal functions for S = 8 , and there are 165 independent real and imaginary parts. The software 
package was used to compute interferograms for each of these real and imaginary parts which are designated, respectively, 
as esmom,r and esuvn^ . For example, the real part with s = 5 , m 0 = 3 , and m, = 1 would be called e531r . Note 
that only |m,| is required. The software package was also used to compute a 16 X 16 X 16 sampling of the 
corresponding density function f . 

Eight other phantoms and their interferograms were also calculated. Three consisted of linear combinations of the 25 
azimuthally symmetrical (m, = 0) functions. A phantom designated 11 contained the first 10 ; a phantom designated 12 
contained the second 10 ; and a phantom designated 13 contained the final 5 . The ordering of the functions is with mo 
increasing most rapidly and s increasing next most rapidly. A fourth phantom was a sphere of unit diameter, and is 
designated as sphere . Four other phantoms were created from various arbitrary linear combinations of the orthonormal 
functions. Noise was added to some interferograms for testing and, in one case, for training a neural network. 

Computed tomography requires only that the interferograms and their geometry be supplied along with a criterion for 
zeroing singular values. Only singular values within a factor of 50 of the maximum singular value were retained. Roughly 
speaking, this choice corresponds to having infinite fringe interferograms measured to between 1/50 and 1/100 fringe. 
In fact, the particular system discussed herein showed better noise immunity than this precision implies. 


INPUT HIDDEN OUTPUT 

LAYER LAYER LAYER 



Fig. 1. Sketch of artificial neural network. 

Only connections to and from one hidden-layer 
node are shown. 


Training the neural networks requires the density function f(r, $, 0) as well. The decision was to use only 16 density 
values for training the neural networks. In fact, there are only 14 variable points; since the 2 boundary points are always 
0 . Recall that here are 7424 inputs from the 29 interferograms of 16 X 16 values each. The actual number of density 
values that the software can handle depends on the architecture of the neural network. Figure 1 shows a simple feed forward 
neural network containing one so called hidden layer of nodes or nonlinear processing elements. A large number of both 
inputs and outputs can be accommodated, if the hidden layer contains a few nodes and there are no connections between the 
input and output layers. But connections between the input and output layers reduce significantly the number of output nodes 
that can be accommodated by the computer memory. The line of 16 density values is chosen to be parallel to the "hard" 
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direction for tomography. That line is parallel to the viewing cone axis (z axis) which, for example, would be perpendicular 
to the hologram in holographic interferometry. 

The so called interferograms (x-ray transforms) are generally between plus or minus unity. The same comment applies 
to the density. The density was normalized in the range [-.8, .8] to accommodate the nonlinear transfer function or activation 
function of the neural net nodes. That function was tanh(input) where input was the sum of weighted inputs to the node. 
The noise capability of the tomography software package can be used to add normally distributed noise to the interferograms. 
The standard deviation of the noise is a percentage specified by the user. Figure 2a shows the density distribution 
corresponding to e440r , and fig. 2b shows the central interferogram (plot of the x-ray transform). Figure 2c shows an 
interferogram to which 32 percent noise has been added. The results of comparing computed with neural net calibration 
tomography are discussed next. 




Fig. 2. (a) Relative density for e440r along a line parallel to the viewing cone axis, 
(b) Plot of x-ray transform for central interferogram of e440r . (c) Plot of x-ray 
transform with 32 percent noise. 


5. RESULTS 


Discussion of the results is confined to the symmetrical objects (m, = 0) , since these objects measure performance 
adequately in the hard direction. The performance for the other functions was similar. Note, however, that all orthonormal 
functions were used with the same relative normalization as in (1) for training the networks. Note also that the neural 
networks were trained with the x-ray transforms themselves, whereas the data for the tomography routine consists of Fourier 
transforms of the x-ray transforms. Training the neural networks on the Fourier transformed data would have been quite 
acceptable, but was not done. 

The SVD was done once, as mentioned, for the geometry discussed herein. Figure 3 shows the results for ellOr , 
e840r , 11 , and sphere . The solid lines represent the theoretical values, and the dots represent the results calculated by 
tomography. Again, the plots are parallel to the cone axis; they are slightly displaced from it because of the choice of a 16 
X 16 X 16 grid. Perhaps the most significant result is the rather poor recovery of e840r by computed tomography for the 
10 degree viewing cone angle (20 degree full view). 

f (ellOr) f (e840r) f dD f (sphere) 
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Fig. 3. Phantoms (solid lines) and their reconstructions (dots) by computed tomography. 

The neural networks used here fell into a pair of classes: the feed forward neural networks (fig. 1) and neural networks 
with prior connections. Each layer in a feed forward network receives inputs only from the previous layer. A network with 
prior connections receives inputs from all prior layers. A special case of a neural network with prior connections is a two 
layer network. A variety of training sets was created from the computed objects and their interferograms. Some examples 
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were: a 169 element training set composed of the 165 orthogonal functions and 4 arbitrary linear combinations; a 28 
element training set composed of the 25 symmetrical orthogonal functions and 11 , 12 , and 13 ; and a 26 element training 
set composed of the 25 symmetrical orthogonal functions and sphere . Then, 11 , 12 , 13 and sphere are suitable test 
objects for nets trained on the first training set; sphere is a suitable test object for nets trained on the second training set; and 
11 , 12 , and 13 are suitable test objects for nets trained on the third training set. 

The biggest differences in training with these different training sets were between nets with and without prior connections. 
Nets with prior connections memorize their training examples as well as computed tomography evaluates them. Figure 4 
shows the results generated by such a net. The net had one hidden layer with 16 nodes as well as the 7424 input and 
16 output nodes. The output layer was connected to the input layer as well as the hidden layer, and the network was trained 
on the 169 member training set. Note that the inadequate recovery of e840r is the same as that for computed tomography 
in fig. 3 . But testing showed that the net does not have very good noise immunity, nor does it extrapolate for sphere nor 
interpolate for 11 very well. 


f (ellOr) f (e840r) f (11) f (sphere) 



Ftg. 4. Phantoms (solid lines) and their reconstructions (dots) by a neural network 
with prior connections. 

Figure 5 shows a result for a feed forward network with only 3 hidden nodes. This network had more trouble learning 
the training examples, but its extrapolation abilities are improved. The network was trained with the 28 record training set. 
The network tolerates noise much better than the network of fig. 4. 
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Fig. 5. Phantoms (solid lines) and their reconstructions (dots) by a neural network 
with 3 hidden nodes and no prior connections. 

Figure 6a shows the response of computed tomography to interferograms with 32 percent noise, and fig. 6b shows the 
response of the net with 3 hidden nodes to the same interferograms. Keep in mind from fig. 2 that these interferograms 
are practically unrecognizable to the eye. The recovered density functions don’t change much in spite of that. 



(a) 



Fig. 6. Effect of 32 percent noise on (a) computed tomography and 
(b) neural net calibration tomography by a neural network with 3 hidden 
nodes and no prior connections. Solid lines represent phantoms and dots 
represent reconstructions. 

6. CONCLUDING REMARKS 


A system of computed tomography is worth considering for testing methods for efficient, parallel processing of whole field 
or flow v isualizatio n data. Computed tomography and calibration tomography using neural networks were both able to 
perform robustly for the 10 degree viewing cone and the direction parallel to the axis of that cone. Both methods display 
comparably the rather limited information available and continue to do so even in the presence of noise. I suspect that a 
counterpropagation network 14 might have performed better than the networks actually used. The system of computed 
tomography could be used to test this conjecture. 
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